Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added option to specify weights in xr.corr() and xr.cov() #8527

Merged
merged 20 commits into from
Dec 12, 2023

Conversation

lluritu
Copy link
Contributor

@lluritu lluritu commented Dec 6, 2023

Added option to specify weights in xr.corr() and xr.cov().
This functionality is useful e.g. to compute correlations of spatial fields when the data is in a lat-lon grid, because grid points closer to the poles have a smaller area and need to have less weight.

  • Closes weighted for xr.corr #4768
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

Copy link

welcome bot commented Dec 6, 2023

Thank you for opening this pull request! It may take us a few days to respond here, so thank you for being patient.
If you have questions, some answers may be found in our contributing guidelines.

@TomNicholas
Copy link
Member

TomNicholas commented Dec 6, 2023

This is great, thanks @lluritu ! And welcome to xarray!

Copy link
Member

@TomNicholas TomNicholas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely need a few tests for this too :)

xarray/core/computation.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR @lluritu !

One comment — we already have a .weighted method — could we use that to make this more orthogonal?

It's not a perfect fit, since folks will need to call xr.corr(foo.weighted(bar), baz), rather than foo.weighted(bar).sum().

@lluritu
Copy link
Contributor Author

lluritu commented Dec 7, 2023

You're welcome!
I don't think what you propose is easy or practical (but I might be missing your point here?). On one hand the function needs both the original and the weighted versions to compute the anomalies (demenead_da). On the other hand, both da and db need to use the same weights, so would be weird letting users have the freedom to specify different weights for the two inputs.

Thanks a lot for the PR @lluritu !

One comment — we already have a .weighted method — could we use that to make this more orthogonal?

It's not a perfect fit, since folks will need to call xr.corr(foo.weighted(bar), baz), rather than foo.weighted(bar).sum().

@lluritu
Copy link
Contributor Author

lluritu commented Dec 7, 2023

Could you please help me a bit w that? e.g. pointing me to current tests of the cov, corr and _cov_corr functions would be a good starting point. Is there a way to test a previous version of a function with a modified one to make sure I'm not breaking the already existing functionality?

We definitely need a few tests for this too :)

@TomNicholas
Copy link
Member

Could you please help me a bit w that? e.g. pointing me to current tests of the cov, corr and _cov_corr functions would be a good starting point. Is there a way to test a previous version of a function with a modified one to make sure I'm not breaking the already existing functionality?

Of course 🙂 ! Tests for this kind of "general-purpose computation method" are in xarray/tests/test_computation.py, and the tests for cov/corr start here

def test_lazy_corrcov(

On one hand the function needs both the original and the weighted versions to compute the anomalies (demenead_da).

foo.weighted(bar) already exists in xarray and returns a DataArrayWeighted object that contains all the above information.

both da and db need to use the same weights, so would be weird letting users have the freedom to specify different weights for the two inputs.

I agree though this seems like a fairly strong argument against @max-sixty 's specific proposed API above, since as far as I know we have no way to associate one set of weights with two DataArrays?

@max-sixty
Copy link
Collaborator

I agree though this seems like a fairly strong argument against @max-sixty 's specific proposed API above, since as far as I know we have no way to associate one set of weights with two DataArrays?

Yeah, the logic would be "use weights from one of the input"

i agree it has that downside.

When we first implemented this, I was originally keener on da_a.cov(da_b) IIRC, which would have worked well here. But we went with xr.cov, so it fits less well.

I'm fine going ahead with the weights parameter. It's less orthogonal, but also more obvious.

Copy link
Collaborator

@max-sixty max-sixty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking pretty good! I think all it needs is a proper test


Separately from this PR, the approach seems fundamentally inelegant, but I don't think there's a better one. The only thing I can think of is allowing .weighted objects to do elementwise arithmetic operations on the underlying array. Then I think that xr.cov(da.weighted(foo), da2.weighted(foo)) would "just work". But also seems like that's possibly playing with fire; I'd have to think more.

@lluritu
Copy link
Contributor Author

lluritu commented Dec 8, 2023

We definitely need a few tests for this too :)

The new xr.corr and xr.cov pass all the previous tests.
Moreover I have added two additional tests that use the weights option.

@max-sixty
Copy link
Collaborator

Could we add a line to the whatsnew? I merged main since this will be the first commit for the first 2024 release!

Then we can hit the button!

@lluritu
Copy link
Contributor Author

lluritu commented Dec 11, 2023

Doctest and mypy produce errors.
For mypy it gives:

xarray/core/computation.py: note: In function "_cov_corr":
xarray/core/computation.py:1538: error: Incompatible return value type (got "DataArray", expected "T_DataArray")  [return-value]
xarray/core/computation.py:1549: error: Incompatible return value type (got "DataArray", expected "T_DataArray")  [return-value]

any hint on how to fix that?

@max-sixty
Copy link
Collaborator

any hint on how to fix that?

I thought I fixed this yesterday, but actually I pushed to the wrong place! So I now pushed to this branch and it should work!

I'll also merge main which should fix the doctests.

@max-sixty max-sixty enabled auto-merge (squash) December 12, 2023 00:00
@max-sixty max-sixty merged commit 562f2f8 into pydata:main Dec 12, 2023
25 checks passed
Copy link

welcome bot commented Dec 12, 2023

Congratulations on completing your first pull request! Welcome to Xarray! We are proud of you, and hope to see you again! celebration gif

dcherian added a commit to dcherian/xarray that referenced this pull request Dec 18, 2023
* main: (26 commits)
  Filter null values before plotting (pydata#8535)
  Update concat.py (pydata#8538)
  Add getitem to array protocol (pydata#8406)
  Added option to specify weights in xr.corr() and xr.cov() (pydata#8527)
  Filter out doctest warning (pydata#8539)
  Bump actions/setup-python from 4 to 5 (pydata#8540)
  Point users to where in their code they should make mods for Dataset.dims (pydata#8534)
  Add Cumulative aggregation (pydata#8512)
  dev whats-new
  Whats-new for 2023.12.0 (pydata#8532)
  explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526)
  Add `eval` method to Dataset (pydata#7163)
  Deprecate ds.dims returning dict (pydata#8500)
  test and fix empty xindexes repr (pydata#8521)
  Remove PR labeler bot (pydata#8525)
  Hypothesis strategy for generating Variable objects (pydata#8404)
  Use numbagg for `rolling` methods (pydata#8493)
  Bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 (pydata#8514)
  fix RTD docs build (pydata#8519)
  Fix type of `.assign_coords` (pydata#8495)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Dec 20, 2023
* main: (58 commits)
  Adapt map_blocks to use new Coordinates API (pydata#8560)
  add xeofs to ecosystem.rst (pydata#8561)
  Offer a fixture for unifying DataArray & Dataset tests (pydata#8533)
  Generalize cumulative reduction (scan) to non-dask types (pydata#8019)
  Filter null values before plotting (pydata#8535)
  Update concat.py (pydata#8538)
  Add getitem to array protocol (pydata#8406)
  Added option to specify weights in xr.corr() and xr.cov() (pydata#8527)
  Filter out doctest warning (pydata#8539)
  Bump actions/setup-python from 4 to 5 (pydata#8540)
  Point users to where in their code they should make mods for Dataset.dims (pydata#8534)
  Add Cumulative aggregation (pydata#8512)
  dev whats-new
  Whats-new for 2023.12.0 (pydata#8532)
  explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526)
  Add `eval` method to Dataset (pydata#7163)
  Deprecate ds.dims returning dict (pydata#8500)
  test and fix empty xindexes repr (pydata#8521)
  Remove PR labeler bot (pydata#8525)
  Hypothesis strategy for generating Variable objects (pydata#8404)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Jan 4, 2024
commit 0a0f800
Merge: 33c8033 41d33f5
Author: Deepak Cherian <[email protected]>
Date:   Tue Jan 2 20:42:51 2024 -0700

    Merge branch 'main' into depr-groupby-squeeze-2

commit 33c8033
Author: Deepak Cherian <[email protected]>
Date:   Tue Jan 2 20:40:42 2024 -0700

    Don't skip for resampling

commit d7be352
Author: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Date:   Wed Jan 3 03:24:13 2024 +0000

    [pre-commit.ci] auto fixes from pre-commit.com hooks

    for more information, see https://pre-commit.ci

commit d13fa0e
Author: Deepak Cherian <[email protected]>
Date:   Tue Jan 2 20:23:43 2024 -0700

    Apply suggestions from code review

    Co-authored-by: Michael Niklas  <[email protected]>

commit dd6ea53
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 19:29:40 2023 -0700

    Silence more warnings

commit 44e5a41
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 19:21:06 2023 -0700

    minimize test mods

commit 94c1c1f
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 18:55:46 2023 -0700

    Add tests for pydata#8263

commit 0ab4eb6
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 18:47:41 2023 -0700

    Fix typing

commit a064430
Merge: d6a3f2d 03ec3cb
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 18:47:04 2023 -0700

    Merge branch 'main' into depr-groupby-squeeze-2

    * main:
      Fix mypy type ignore (pydata#8564)
      Support for the new compression arguments. (pydata#7551)
      FIX: reverse index output of bottleneck move_argmax/move_argmin functions (pydata#8552)
      Adapt map_blocks to use new Coordinates API (pydata#8560)
      add xeofs to ecosystem.rst (pydata#8561)
      Offer a fixture for unifying DataArray & Dataset tests (pydata#8533)
      Generalize cumulative reduction (scan) to non-dask types (pydata#8019)

commit d6a3f2d
Author: Deepak Cherian <[email protected]>
Date:   Thu Dec 21 18:46:50 2023 -0700

    Fix generator for aggregations

commit 97f1695
Author: Deepak Cherian <[email protected]>
Date:   Tue Dec 19 10:58:11 2023 -0700

    Fix docs

commit 5b33b98
Author: Deepak Cherian <[email protected]>
Date:   Sun Dec 17 20:35:53 2023 -0700

    fix whats-new

commit 80b2b36
Author: Deepak Cherian <[email protected]>
Date:   Sun Dec 17 20:26:17 2023 -0700

    Reduce more warnings

commit 5f6f4ea
Merge: a57d4ae 2971994
Author: Deepak Cherian <[email protected]>
Date:   Sat Dec 16 20:33:13 2023 -0700

    Merge branch 'main' into depr-groupby-squeeze-2

    * main: (26 commits)
      Filter null values before plotting (pydata#8535)
      Update concat.py (pydata#8538)
      Add getitem to array protocol (pydata#8406)
      Added option to specify weights in xr.corr() and xr.cov() (pydata#8527)
      Filter out doctest warning (pydata#8539)
      Bump actions/setup-python from 4 to 5 (pydata#8540)
      Point users to where in their code they should make mods for Dataset.dims (pydata#8534)
      Add Cumulative aggregation (pydata#8512)
      dev whats-new
      Whats-new for 2023.12.0 (pydata#8532)
      explicitly skip using `__array_namespace__` for `numpy.ndarray` (pydata#8526)
      Add `eval` method to Dataset (pydata#7163)
      Deprecate ds.dims returning dict (pydata#8500)
      test and fix empty xindexes repr (pydata#8521)
      Remove PR labeler bot (pydata#8525)
      Hypothesis strategy for generating Variable objects (pydata#8404)
      Use numbagg for `rolling` methods (pydata#8493)
      Bump pypa/gh-action-pypi-publish from 1.8.10 to 1.8.11 (pydata#8514)
      fix RTD docs build (pydata#8519)
      Fix type of `.assign_coords` (pydata#8495)
      ...

commit a57d4ae
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 21:36:04 2023 -0700

    Test one more warning

commit bf8139d
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 21:33:45 2023 -0700

    Update xarray/tests/test_groupby.py

commit 4e9a063
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 21:10:14 2023 -0700

    Set squeeze=None for Dataset too

commit c2e576e
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 20:54:17 2023 -0700

    Fix first, last

commit 6d8e822
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 20:46:21 2023 -0700

    better warning

commit 62c334b
Author: Deepak Cherian <[email protected]>
Date:   Fri Dec 1 20:45:17 2023 -0700

    silence warnings

commit b7805a8
Author: dcherian <[email protected]>
Date:   Tue Aug 15 10:54:25 2023 -0600

    Deprecate `squeeze` in GroupBy.

    Closes pydata#2157
@lee1043
Copy link

lee1043 commented Jan 8, 2024

This is indeed a great contribution, thank you @lluritu! Is the new version of xarray containing this capability released, or soon will be?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

weighted for xr.corr
5 participants